Phrase-Based SMT with Shallow Tree-Phrases
نویسندگان
چکیده
In this article, we present a translation system which builds translations by gluing together Tree-Phrases, i.e. associations between simple syntactic dependency treelets in a source language and their corresponding phrases in a target language. The Tree-Phrases we use in this study are syntactically informed and present the advantage of gathering source and target material whose words do not have to be adjacent. We show that the phrase-based translation engine we implemented benefits from Tree-Phrases.
منابع مشابه
Practical Approach to Syntax-based Statistical Machine Translation
This paper presents a practical approach to statistical machine translation (SMT) based on syntactic transfer. Conventionally, phrase-based SMT generates an output sentence by combining phrase (multiword sequence) translation and phrase reordering without syntax. On the other hand, SMT based on tree-to-tree mapping, which involves syntactic information, is theoretical, so its features remain un...
متن کاملSyntax-based Statistical Machine Translation
In its early development, machine translation adopted rule-based approaches, which can include the use of language syntax. The late 1980s and early 1990s saw the inception of the statistical machine translation (SMT) approach, where translation models can be learned automatically from a parallel corpus rather than created manually by humans. Initial SMT models were word-based and phrase-based, ...
متن کاملIntroducing Non-Syntactic Phrases into a Syntax-Based Machine Translation System
The dominance of traditional phrase-based statistical machine translation (SMT) models (Koehn, Och, and Marcu, 2003) has recently been challenged by the development and improvement of a number of newer translation models that explicity take into account the syntax of the sentences being translated. One simple approach to incorporating syntax is to limit the phrases learned by a standard SMT tra...
متن کاملOffline Extraction of Overlapping Phrases for Hierarchical Phrase-Based Translation
Standard SMT decoders operate by translating disjoint spans of input words, thus discarding information in form of overlapping phrases that is present at phrase extraction time. The use of overlapping phrases in translation may enhance fluency in positions that would otherwise be phrase boundaries, they may provide additional statistical support for long and rare phrases, and they may generate ...
متن کاملPhrase Alignment for Integration of SMT and RBMT Resources
A novel approach is presented for extracting syntactically motivated phrase alignments. In this method we can incorporate conventional resources such as dictionaries and grammar rules into a statistical optimization framework for phrase alignment. The method extracts bilingual phrases by incrementally merging adjacent words or phrases on both source and target language sides in accordance with ...
متن کامل